669 research outputs found

    Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

    Get PDF
    Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

    Modeling HIV-1 Drug Resistance as Episodic Directional Selection

    Get PDF
    The evolution of substitutions conferring drug resistance to HIV-1 is both episodic, occurring when patients are on antiretroviral therapy, and strongly directional, with site-specific resistant residues increasing in frequency over time. While methods exist to detect episodic diversifying selection and continuous directional selection, no evolutionary model combining these two properties has been proposed. We present two models of episodic directional selection (MEDS and EDEPS) which allow the a priori specification of lineages expected to have undergone directional selection. The models infer the sites and target residues that were likely subject to directional selection, using either codon or protein sequences. Compared to its null model of episodic diversifying selection, MEDS provides a superior fit to most sites known to be involved in drug resistance, and neither one test for episodic diversifying selection nor another for constant directional selection are able to detect as many true positives as MEDS and EDEPS while maintaining acceptable levels of false positives. This suggests that episodic directional selection is a better description of the process driving the evolution of drug resistance

    CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

    Get PDF
    Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

    HIV-Specific Probabilistic Models of Protein Evolution

    Get PDF
    Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1–the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses

    Phylodynamic Reconstruction Reveals Norovirus GII.4 Epidemic Expansions and their Molecular Determinants

    Get PDF
    Noroviruses are the most common cause of viral gastroenteritis. An increase in the number of globally reported norovirus outbreaks was seen the past decade, especially for outbreaks caused by successive genogroup II genotype 4 (GII.4) variants. Whether this observed increase was due to an upswing in the number of infections, or to a surveillance artifact caused by heightened awareness and concomitant improved reporting, remained unclear. Therefore, we set out to study the population structure and changes thereof of GII.4 strains detected through systematic outbreak surveillance since the early 1990s. We collected 1383 partial polymerase and 194 full capsid GII.4 sequences. A Bayesian MCMC coalescent analysis revealed an increase in the number of GII.4 infections during the last decade. The GII.4 strains included in our analyses evolved at a rate of 4.3–9.0×10−3 mutations per site per year, and share a most recent common ancestor in the early 1980s. Determinants of adaptation in the capsid protein were studied using different maximum likelihood approaches to identify sites subject to diversifying or directional selection and sites that co-evolved. While a number of the computationally determined adaptively evolving sites were on the surface of the capsid and possible subject to immune selection, we also detected sites that were subject to constrained or compensatory evolution due to secondary RNA structures, relevant in virus-replication. We highlight codons that may prove useful in identifying emerging novel variants, and, using these, indicate that the novel 2008 variant is more likely to cause a future epidemic than the 2007 variant. While norovirus infections are generally mild and self-limiting, more severe outcomes of infection frequently occur in elderly and immunocompromized people, and no treatment is available. The observed pattern of continually emerging novel variants of GII.4, causing elevated numbers of infections, is therefore a cause for concern

    Restricted Genetic Diversity of HIV-1 Subtype C Envelope Glycoprotein from Perinatally Infected Zambian Infants

    Get PDF
    Background: Mother-to-child transmission of HIV-1 remains a significant problem in the resource-constrained settings where anti-retroviral therapy is still not widely available. Understanding the earliest events during HIV-1 transmission and characterizing the newly transmitted or founder virus is central to intervention efforts. In this study, we analyzed the viral env quasispecies of six mother-infant transmission pairs (MIPs) and characterized the genetic features of envelope glycoprotein that could influence HIV-1 subtype C perinatal transmission. Methodology and Findings: The V1-V5 region of env was amplified from 6 MIPs baseline samples and 334 DNA sequences in total were analyzed. A comparison of the viral population derived from the mother and infant revealed a severe genetic bottleneck occurring during perinatal transmission, which was characterized by low sequence diversity in the infant. Phylogenetic analysis indicates that most likely in all our infant subjects a single founder virus was responsible for establishing infection. Furthermore, the newly transmitted viruses from the infant had significantly fewer potential N-linked glycosylation sites in Env V1-V5 region and showed a propensity to encode shorter variable loops compared to the nontransmitted viruses. In addition, a similar intensity of selection was seen between mothers and infants with a higher rate of synonymous (dS) compared to nonsynonymous (dN) substitutions evident (dN/dS\u3c1). Conclusions: Our results indicate that a strong genetic bottleneck occurs during perinatal transmission of HIV-1 subtype C. This is evident through population diversity and phylogenetic patterns where a single viral variant appears to be responsible for infection in the infants. As a result the newly transmitted viruses are less diverse and harbored significantly less glycosylated envelope. This suggests that viruses with the restricted glycosylation in envelope glycoprotein appeared to be preferentially transmitted during HIV-1 subtype C perinatal transmission. In addition, our findings also indicated that purifying selection appears to predominate in shaping the early intrahost evolution of HIV-1 subtype C envelope sequences

    Phylogeography of Japanese encephalitis virus:genotype is associated with climate

    Get PDF
    The circulation of vector-borne zoonotic viruses is largely determined by the overlap in the geographical distributions of virus-competent vectors and reservoir hosts. What is less clear are the factors influencing the distribution of virus-specific lineages. Japanese encephalitis virus (JEV) is the most important etiologic agent of epidemic encephalitis worldwide, and is primarily maintained between vertebrate reservoir hosts (avian and swine) and culicine mosquitoes. There are five genotypes of JEV: GI-V. In recent years, GI has displaced GIII as the dominant JEV genotype and GV has re-emerged after almost 60 years of undetected virus circulation. JEV is found throughout most of Asia, extending from maritime Siberia in the north to Australia in the south, and as far as Pakistan to the west and Saipan to the east. Transmission of JEV in temperate zones is epidemic with the majority of cases occurring in summer months, while transmission in tropical zones is endemic and occurs year-round at lower rates. To test the hypothesis that viruses circulating in these two geographical zones are genetically distinct, we applied Bayesian phylogeographic, categorical data analysis and phylogeny-trait association test techniques to the largest JEV dataset compiled to date, representing the envelope (E) gene of 487 isolates collected from 12 countries over 75 years. We demonstrated that GIII and the recently emerged GI-b are temperate genotypes likely maintained year-round in northern latitudes, while GI-a and GII are tropical genotypes likely maintained primarily through mosquito-avian and mosquito-swine transmission cycles. This study represents a new paradigm directly linking viral molecular evolution and climate

    Evolutionary Dynamics and Emergence of Panzootic H5N1 Influenza Viruses

    Get PDF
    The highly pathogenic avian influenza (HPAI) H5N1 virus lineage has undergone extensive genetic reassortment with viruses from different sources to produce numerous H5N1 genotypes, and also developed into multiple genetically distinct sublineages in China. From there, the virus has spread to over 60 countries. The ecological success of this virus in diverse species of both poultry and wild birds with frequent introduction to humans suggests that it is a likely source of the next human pandemic. Therefore, the evolutionary and ecological characteristics of its emergence from wild birds into poultry are of considerable interest. Here, we apply the latest analytical techniques to infer the early evolutionary dynamics of H5N1 virus in the population from which it emerged (wild birds and domestic poultry). By estimating the time of most recent common ancestors of each gene segment, we show that the H5N1 prototype virus was likely introduced from wild birds into poultry as a non-reassortant low pathogenic avian influenza H5N1 virus and was not generated by reassortment in poultry. In contrast, more recent H5N1 genotypes were generated locally in aquatic poultry after the prototype virus (A/goose/Guangdong/1/96) introduction occurred, i.e., they were not a result of additional emergence from wild birds. We show that the H5N1 virus was introduced into Indonesia and Vietnam 3–6 months prior to detection of the first outbreaks in those countries. Population dynamics analyses revealed a rapid increase in the genetic diversity of A/goose/Guangdong/1/96 lineage viruses from mid-1999 to early 2000. Our results suggest that the transmission of reassortant viruses through the mixed poultry population in farms and markets in China has selected HPAI H5N1 viruses that are well adapted to multiple hosts and reduced the interspecies transmission barrier of those viruses
    corecore